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Abstract 

We formalize and study the problem of learning the structure and parameters of 
graphical games from strictly behavioral data. We cast the problem as a maximum 
likelihood estimation based on a generative model defined by the pure-strategy Nash 
equilibria of the game. The formulation brings out the interplay between goodness-of- 
fit and model complexity: good models capture the equilibrium behavior represented 
in the data while controlling the true number of equilibria, including those potentially 
unobserved. We provide a generalization bound for maximum likelihood estimation. We 
discuss several optimization algorithms including convex loss minimization, sigmoidal 
approximations and exhaustive search. We formally prove that games in our hypothesis 
space have a small true number of equilibria, with high probability; thus, convex loss 
minimization is sound. We illustrate our approach, show and discuss promising results 
on synthetic data and the U.S. congressional voting records. 



1 Introduction 



Graphical games Kearns et al. 2001 were one of the first and most influential graphical 



models for game theory. It has been about a decade since their introduction to the AI 
community. There has also been considerable progress on problems of computing classical 



equilibrium solution concepts such as Nash Nash, 19511 and correlated equilibria Aumann 



Vickrey and Roller [2002 , Ortiz and 



1974 in graphical games (see, e.g., Kearns et al. 2001 
Kearns' f 2002], [Blum et a l.' [2006| 
[2008J , ^ Jiang and Leyton-Brown^ [2011 and the references therein). Indeed, graphical games 



Kakade et al. 2003 , Papadimitriou and Roughgarden 



played a prominent role in establishing the computational complexity of computing Nash 
equilibria in general normal- form games (see, e.g., Daskalakis et al. [2009 and the references 
therein) . 

Relatively less attention has been paid to the problem of learning the structure of graph- 
ical games from data. Addressing this problem is essential to the development, potential 
use and success of game-theoretic models in practical applications. 

Indeed, we are beginning to see an increase in the availability of data collected from 
processes that are the result of deliberate actions of agents in complex system. A lot of this 
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data results from the interaction of a large number of individuals, being people, companies, 
governments, groups or engineered autonomous systems (e.g. autonomous trading agents), 
for which any form of global control is usually weak. The Internet is currently a major 
source of such data, and the smart grid, with its trumpeted ability to allow individual 
customers to install autonomous control devices and systems for electricity demand, will 
likely be another one in the near future. 

We present a formal framework and design algorithms for learning the structure and 



parameters of graphical games iKearns et al. , 20011 in large populations of agents. We 



concentrate on learning from purely behavioral data. We expect that, in most cases, the 
parameters quantifying a utility function or best-response condition are unavailable and 
hard to determine in real- world settings. The availability of data resulting from the obser- 
vation of an individual public behavior is arguably a weaker assumption than the availability 
of individual utility observations, which are often private. 

Our technical contributions include a novel generative model of behavioral data in Sec- 
tion |4] for general games. We define identifiability and triviality of games. We provide 
conditions which ensures identifiability among non-trivial games. We then present the 
maximum likelihood problem for general (non-trivial identifiable) games. In Section [5| we 
show a generalization bound for the maximum likelihood problem as well as an upper bound 
of the VC-dimension of influence games. In Section [6j we approximate the original problem 
by maximizing the number of observed equilibria in the data, suitable for a hypothesis space 
of games with small true number of equilibria. We then present our convex loss minimiza- 
tion approach and a baseline sigmoidal approximation for (linear) influence games. We also 
present exhaustive search methods for both general as well as influence games. In Section 
[7j we define absolute-indifference of players and show that our convex loss minimization 
approach produces games in which all players are non- absolutely- indifferent. We provide a 
distribution-free bound which shows that linear influence games have small true number of 
equilibria with high probability. 



2 Related Work 



Our work complements the recent line of work on learning graphical games Vorobeychik 
eTaT, '2005', iPicici et al.l [20081 [Duong et alj [20091 [Oao and Pfeffer, '20T0', 'Ziebart et ah 



2010, ,Waugh et al, 



2011 



2010), Waugh et al. 



2011 



. With the exception of Ziebart et al. 
previous methods assume that the actions as well as corresponding payoffs (or noisy samples 
from the true payoff function) are observed in the data. Another notable exception is 
a recently proposed framework from the learning theory community to model collective 
behavior [Kearns and Wortman , 2008 . The approach taken there considers dynamics and 



is based on stochastic models. Our work differs from methods that assume that the game is 
known 



Pfeffer 



Wright and Leyton-Brown 



2010 



2010 , Wright and Leyton-Brown 2010 



m 



Duong et al] [2009 



Ziebart et al. 


2010 


Waugh et al. 


2011 



present experimental 
and up to 13 players 



In this paper, we assume that the joint-actions is the only observable information. To 
the best of our knowledge, we present the first techniques for learning the structure and 
parameters of large-population graphical games from joint-actions only. Furthermore, we 
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present experimental validation in games of up to 100 players. Our convex loss minimization 
approach could potentially be applied to larger problems since it is polynomial-time. 

There has been a significant amount of work for learning the structure of probabilistic 
graphical models from data. We mention only a few references that follow a maximum 
likelihood approach for Markov random fields Lee et al.l [20061 , bounded tree- width distri 
butions iChow and Liu 



1968 



Srebro, 2001 , Ising models 



et al. , 2008 , Hofling and Tibshirani 



2009 



Wainwright et al. 



2006 



Banerjee 



, Gaussian graphical models Banerjee et al. 



Bayesian networks Guo and Schuurmans 2006 Schmidt et al. , 2007b and directed cyclic 



graphs Schmidt and Murphy, 2009 



2006 



Our approach learns the structure and parameters of games by maximum likelihood 
estimation on a related probabilistic model. Our probabilistic model does not fit into any of 
the types described above. Although a (directed) graphical game has a directed cyclic graph, 
there is a semantic difference with respect to graphical models. Structure in a graphical 
model implies a factorization of the probabilistic model. In a graphical game, the graph 
structure implies strategic dependence between players, and has no immediate probabilistic 
implication. Furthermore, our general model differs from Schmidt and Murphy 2009| since 
our generative model does not decompose as a multiplication of potential functions. 



3 Background 



In classical game-theory (see, e.g. Fudenberg and Tirole [1991 for a textbook introduction). 



a normal-form game is defined by a set of players V (e.g. we can let V = {1, . . . ,n} if 
there are n players), and for each player i, a set of actions, or pure-strategies Ai, and a 
payoff function Ui : Xji^v^j — ^ ^ mapping the joint-actions of all the players, given by 
the Cartesian product A = Xj^yAj, to a real number. In non-cooperative game theory 
we assume players are greedy, rational and act independently, by which we mean that each 
player i always want to maximize their own utility, subject to the actions selected by others, 
irrespective of how the optimal action chosen help or hurt others. 

A core solution concept in non-cooperative game theory is that of an Nash equilibrium. 
A joint-action x* G ^ is a pure- strategy Nash equilibrium of a non-cooperative game if, for 
each player i, x* G argmax^^g^.nj(rEj,xl-); that is, x* constitutes a mutual best-response, 
no player i has any incentive to unilaterally deviate from the prescribed action x'^, given 
the joint-action of the other players x^^ G x j^Y_^,iyAj in the equilibrium. 

In what follows, we denote a game by G, and the set of all pure-strategy Nash equilibria 
of Q by: 

MSiG) = {x* I (Vi G V) X* G argmax^^g^^Ui(xi,xlJ} (1) 



A (directed) graphical game is a game-theoretic graphical model [Kearns et al. 2001[ . 



It provides a succinct representation of normal-form games. In a graphical game, we have a 
(directed) graph G = (V, E) in which each node in V corresponds to a player in the game. 
The interpretation of the edges/arcs ii^ of G is that the payoff function of player i is only a 
function of the set of parents/neighbors Mi = {j \ {i,j) G E} in G (i.e. the set of players 
corresponding to nodes that point to the node corresponding to player i in the graph). In 
the context of a graphical game, we refer to the Uj's as the local payoff functions /matrices. 
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Linear influence games [Irfan and Ortiz 2011 are a sub-class of graphical games. For 
linear influence games, we assume that we are given a matrix of influence weights W G M"^", 
with diag(W) = 0, and a threshold vector b G M". For each player i, we define the 
influence function fii^K^i) = "^jfzj^.WijXj — bi = ^i-i^'x.^i — hi and the payoff function 
Uj(x) = Xifi{x-i). We further assume binary actions: Ai = { — 1,+1} for all i. The best 
response x* of player i to the joint-action x_j of the other players is defined as: 



vi^i _i x__i > bi 

Wi. _,;'^X_i < hi 



Wi 



x* = +l, 

X* = -I and (viTj _ j " x_j 

x*£{-l,+l}. 



b^)>0 



(2) 



Hence, for any other player j, Wij G M can be thought as a weight parameter quantifying 
the "influence factor" that j has on i, and 5j G M as a threshold parameter to the level of 
"tolerance" that player i has for playing —1. 

linear influence games are also a sub-class of poly- 
Furthermore, in the special case of b = and symmetric 



matrix games Janovskaja 



As discussed in Irfan and Ortiz 12011 



1968 



W, a linear influence game becomes a party- affiliation game iFabrikant et al. , 2004 



Figure [3] provides a preview illustration of the application of our approach to congres- 
sional voting. 



4 Preliminaries 



Our goal is to learn the structure and parameters of a graphical game from observed joint- 
actions. Note that our problem is unsupervised, i.e. we do not know a priori which joint- 
actions are equilibria and which ones are not. If our only goal were to find a game G in 
which all the given observed data is an equilibrium, then a "dummy" influence game with 
g = (W,b),W = 0,b = would be the optimal solution since \M£{g)\ = T. In this 
section, we present a probabilistic formulation that allows finding games that maximize the 
empirical proportion of equilibria in the data while keeping the true proportion of equilibria 
as low as possible. Furthermore, we show that trivial games such as W = 0, b = 0, obtain 
the lowest log-likelihood. 



4.1 On the Identifiability of Games 

Several games with different coefficients can lead to the same Nash equilibria set. As a simple 
example that illustrates the issue of identifiability, consider the three following influence 
games with the same Nash equilibria sets, i.e. M£(Wk, 0) = {(—1, —1, —1), (+1, +1, +1)} 
for k = 1,2,3: 





" 0" 




"0 0" 




"0 1 r 


Wi = 


1/2 


, W2 = 


2 


, W3 = 


1 1 




1 




1 




1 1 



Clearly, using structural properties alone, one would generally prefer the former two models 
to the latter, all else being equal (e.g. generalization performance). A large number of 
the econometrics literature concerns the issue of identifiability of models from data. In 
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REPUBLICANS 



DEMOCRATS 




Figure 1: 110th US Congress's Linear Influence Game (January 3, 2007-09) 

illustration of tire application of our approacli to real congressional voting data. 



We provide an 



Irfan and Ortiz 2011 



use such LIGs to address a variety of computational problems, including the identification of most influ- 
ential senators. We show the graph connectivity of a LIG learnt by independent ^i-regularized logistic 
regression (see Sect. 6.51. We highlight some characteristics of the graph, consistent with anecdotal evi- 
dence. First, senators are more likely to be influenced by members of the same party than by members 
of the opposite party (the dashed green line denotes the separation between the parties). Republicans 
were "more strongly united" (tighter connectivity) than Democrats at the time. Second, the current US 
Vice President Biden (Dem. /Delaware) and McCain (Rep. /Arizona) are displayed at the "extreme of each 
party" (Biden at the bottom-right corner, McCain at the bottom-left) eliciting their opposite ideologies. 
Third, note that Biden, McCain, the current US President Obama (Dem. /Illinois) and US Secretary of 
State Hillary Clinton (Dem. /New York) have very few outgoing arcs; e.g., Obama only directly influences 
Feingold (Dem. /Wisconsin), a prominent senior member with strongly liberal stands. One may wonder why 
do such prominent senators seem to have so little direct influence on others? A possible explanation is 
that US President Bush was about to complete hist second term (the maximum allowed). Both parties had 
very long presidential primaries. All those senators contended for the presidential candidacy within their 
parties. Hence, one may posit that those senators were focusing on running their campaigns and that their 
influence in the day-to-day business of congress was channeled through other prominent senior members of 
their parties. 



typical machine-learning fashion, we side-step this issue by measuring the quality of our 
data-induced models via their generalization ability and invoke the principle of Ockham's 
razor to bias our search toward simpler models using well-known and -studied regularization 
techniques. In particular, we take the view that games are identifiable by their Nash 
equilibria. Hence our next definition. 

Definition 1. We say that two games Qi and Q2 are equivalent if and only if their Nash 
equilibria sets are identical, i.e.: Qi =j\fe G2 M£{Qi) = M£{Q2). 

4.2 Generative Model of Behavioral Data 

We propose the following generative model for behavioral data based strictly in the context 
of "simultaneous" /one-shot play in non-cooperative game theory. Let ^ be a game. With 
some probability < g < 1, a joint-action x is chosen uniformly at random from N£{Q); 
otherwise, x is chosen uniformly at random from its complement set { — Ij-t-l}" — N£{Q). 
Hence, the generative model is a mixture model with mixture parameter q corresponding 
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to the probability that a stable outcome (i.e. a Nash equilibrium) of the game is observed. 
Formally, the probability mass function (PMF) over joint-behaviors { — 1, +1}" parametrized 
by {Q,q) is: 

where we can think of q as the "signal" level, and thus 1 — g as the "noise" level in the data 
set. 

Remark 2. Note that in order for eq.^ to be a valid PMF for any Q, we need to enforce 
the following conditions \M£{G)\ = =^ g = and \J\f£{G)\ = 2" =^ g = 1. Furthermore, 
note that in both cases (\M£{G)\ £ {0,2"}) the PMF becomes a uniform distribution. On 
the other hand, if < \M£{G)\ < 2" then setting q G {0, 1} leads to an invalid PMF. 

Let tt{G) be the true proportion of equilibria in the game Q relative to all possible 
joint-actions, i.e.: 

7r{g)^\Af£{g)\/2" (4) 

Definition 3. We say that a game Q is trivial if and only if \M£{Q)\ G {0,2"} (or equiv- 
alently it{G) G {0, 1} and non-trivial if and only if < \J\f£{G)\ < 2" (or equivalently 
< 7r{g) < 1). 

The following propositions establish that the condition q > TT{g) ensures that the prob- 
ability of an equilibrium is strictly greater than a non-equilibrium. The condition also 
guarantees identifiability among non-trivial games. 

Proposition 4. Given a non-trivial game Q, the mixture parameter q > tt{Q) if and only 
if P{g,q){^i) > P(g,g)(x2) for any xi G M£{g) and X2 ^ N£{Q). 

Proof Note that P(g,g)(xi) = q/\N£{g)\ > Pig,q){^2) = (1 - '?)/(2" - W£{g)\) ^ q > 
\M£(Q)\/2"' and given eq.Q, we prove our claim. □ 

Proposition 5. Let Qi and Q2 be two non-trivial games. For some mixture parameter 
q > max(7r(^i), 7r(^2))) Gi o-nd Q2 are equivalent if and only if they induce the same PMF 
over the joint-action space { — 1,-|-1}" of the players, i.e.: Qi =j\f£ Q2 ^ (Vx) p(g^ ,j)(x) = 

Proof. Let M£k = J^£{Qk)- First, we prove the =^ direction. By Definition [T| Qi 

G2 =^ ^f£l = M£2- Note that g)(x) in eq.Q depends only on characteristic functions 

l[x G M£]^. Therefore, (Vx) p(gj^g)(x) = pi^g^^^^(yd,. 

Second, we prove the -4= direction by contradiction. Assume (3x) x G H£\ A x ^ N£2- 
P(g,,g)(x) = P(g,,,)(x) implies that ql\M£x\ = (1 - g)/(2" - \M£2\) q = \M£i\/{2" + 
\M£i\ - \M£2\)- Since q > max(7r(gi), 7r(g2)) ^ q > max{\M£i\,\M£2\)/2'' by eq.Q. 
Thereforemax(|AA£:i|,|AA£:2|)/2" < \M£i\/{2'' + \M£i\-\M£2\)- If we assume that \M£i\ > 
|A/'<?2| we reach the contradiction |AA<5i| — \M£2\ < 0. If we assume that |AA<Si| < \M£2\ we 
reach the contradiction (2" - \M£2\){\M£2\ - \M£i\) < 0. □ 

Remark 6. Recall that a trivial game induces a uniform PMF by Remark^ Therefore, 
a non-trivial game is not equivalent to a trivial game since by Proposition [7J non-trivial 
games do not induce uniform PMFs. 



6 



4.3 Learning the Structure of Games via Maximum Likelihood Estima- 
tion 



The learning problem consists on estimating the structure and parameters of a graphical 
game from data. We point out that our problem is unsupervised, i.e. we do not know a 
priori which joint-actions are equilibria and which ones are not. We based our framework 
on the fact that games are identifiable with respect to their induced PMF by Proposition 

El 

First, we introduce a shorthand notation for the Kullback-Leibler (KL) divergence be- 
tween two Bernoulli distributions parametrized by < pi < 1 and < P2 < 1: 

KL{pi\\p2) ^ KL(Beinoulh{pi)\\Bernoulli{p2)) 
= Pilog2i + (l-pi)loglEf^ 

Using this function, we can derive the following expression of the maximum likelihood 
estimation problem. 

Lemma 7. Given a dataset D = x^^\ . . . , x^™"-*, let tt{Q) be the empirical proportion of 
equilibria, i.e. the proportion of samples in D that are equilibria ofQ: 

^G)^hT.M'^^M£{G)] (6) 

the maximum likelihood estimation problem for the probabilistic model in eg.Q can be ex- 
pressed as: 

max C{g,q) , C{g,q) = KL(n{g)\\7T{g)) - KLmg)\\q) - nlog2 (7) 
(G,g)eT 

where % is the class of games of interest, T = {{g^q) | ^ G ^ A < 7r(^) < q < 1] is 
the hypothesis space of non-trivial identifiable games, 7r(^) is defined as in eg.Q and the 
optimal mixture parameter q = min(7f(^), 1 — 2^). 

Proof. Let MS = M£{g), IT = IT {g) and n = Tf{g). First, for a non-trivial ^, log p(g^q)(x(')) = 
log for x(') G J\f£, and logp(g ,j)(x(')) = log 2^?3]^vT| ■^^'^'^ ^ -^^^ '^'^^ average log- 
likelihood C{g, q) = iEi logpg,g(x(0) = log + (1 - ^) log = ?f log f + (1 - 
^) log jE^ — nlogl. By adding = — vf log vf + 7? log vf — (1 — vf) log(l — vf) + (1 — vf) log(l — vf), 

this can be rewritten as £.{g, q) = vf log ^ + (1 — 7?) log — vf log | — (1 — vf) log ^5^^ — nlog2, 
and by using eq.([5]) we prove our claim. 

Note that by maximizing with respect to the mixture parameter q and by properties of 
the KL divergence, we get KL{Tr\\q) = 44> g = vf. We define our hypothesis space T given 
the conditions in Remark [2] and Propositions |4] and [5j For the case vf = 1, we "shrink" 
the optimal mixture parameter to 1 — 2^ in order to avoid generating an invalid PMF as 
discussed in Remark [2j □ 

Remark 8. Recall that a trivial game ( e.g. g = (W, b), W = 0, b = 0, vr(^) = 1 j induces a 
uniform PMF by Remark^ and therefore its log-likelihood is — nlog2. Note that the lowest 
log-likelihood for non-trivial identifiable games in eg.Q is — nlog2 by setting the optimal 
mixture parameter q = vf(t/) and given that KL{jr{g)\\'K[g)) > 0. 
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Furthermore, eq.Q implies that for non-trivial identifiable games Q, we expect the true 
proportion of equilibria 7r{Q) to be strictly less than the empirical proportion of equilibria 
7r{G) in the given data. This is by setting the optimal mixture parameter q = vr(^) and the 
condition q > 7r{Q) in our hypothesis space. 



5 Generalization Bound and VC-Dimension 

In this section, we show a generalization bound for the maximum likelihood problem as 
well as an upper bound of the VC-dimension of linear influence games. Our objective is to 
establish that with probability at least 1 — 5, for some confldence parameter < 5 < 1, 
the maximum likelihood estimate is within e > of the optimal parameters, in terms of 
achievable expected log-likelihood. 

Given the ground truth distribution Q of the data, let Tt{Q) be the expected proportion 
of equilibria, i.e.: 

7rig)=FQ[xeM£{g)] (8) 

and let C{Q, q) be the expected log-likelihood of a generative model from game G and mixture 
parameter q, i.e.: 

£(g,g)=EQ[logp(g,,)(x)] (9) 

Note that our hypothesis space T in eq. ([T]) includes a continuous parameter q that could 
potentially have infinite VC-dimension. The following lemma will allow us later to prove 
that uniform convergence for the extreme values of q implies uniform convergence for all q 
in the domain. 

Lemma 9. Consider any game Q and, for < q" < q' < q < 1, let 9 = {Q, q), 9' = {Q, q') 
and 9" = {g, q"_). If for any e > Q we have \C{9) - C{9)\ < e/2 and \C{9") - C{9")\ < e/2, 
then \C{9') - C{9')\ < e/2. 

Proof Let Af£ = J\f£{Q), vr = 7r(g), ^ = ^{Q), vf = Tt{g), and E[-] and ¥[■] be the expecta- 
tion and probability with respect to the ground truth distribution Q of the data. 

First note that for any 9 = {Q, q), we have C{9) = E[logp(g^g)(x)] = E[l[x G J\f£] log + 
l[x i N£\ log 2^^] = P[x e N£\ log ^ + p[x ^ N£] log = ^log ^ + (i _ 

log 2^ = ^ log (t^ • + 2^ = * log (t^ • ^) + log T^f - ^ log 2. 

Similarly, for any 9 = {Q,q), we have C{9) = vf log (^jz^ • "^t") + log i^f ~ nlog2. So 

that C{9) - m = - vf ) log (t^ • ^) . 

Furthermore, the function is strictly monotonically increasing for < q < 1. If 

n>7r then -e/2 < C{9") - C{9") < C{9') - C{9') < C{9) - C{9) < e/2. Else, if < ^, we 
have e/2 > C{9") - C{9")_ > C{9') - C{9') > C{9) - C{9) > -e/2. Finally, if ?f = vf then 
C{9") - C{9") = C{9') - C{9') = C{9) - C{9) = 0. □ 

The following theorem shows that the expected log-likelihood of the maximum likelihood 
estimate converges in probability to that of the optimal, as the data size m increases. 



8 



Theorem 10. Let 6 = {G,q) be the maximum likelihood estimate in eq. and e 
{G,q) be the maximum expected likelihood estimate, i.e. 9 = argmaxggY-'^(^) "^^^ ^ 
arg max^g-f^C^); then with probability at least 1 — 5: 



C{9) > C{9) - logmax(2m,^_) + ^log2 (log d(l^) + log |) (10) 



where Ti is the class of games of interest, T = {{Q,q) \ Q £ Ti A < vr(^) < q < 1} is 
the hypothesis space of non-trivial identifiable games and d{J-L) = |Ugg-^{AA<S(^)}| is the 
number of all possible games in % (identified by their Nash equilibria sets). 

Proof. First our objective is to find a lower bound for ¥[C{9)-C{9) > -e] > F[C(9)-C{9) > 
-e + {€(9) - C{9))] > F[-C{9) + C(9) > -^,C{9) - C{9) > -f] = ¥[C(9) - C{9) < 
^,C{9)-C{9) > -f] = 1-¥[C(9)- C(9) > C{9) - C{9) < -f]. 

Let q = max(l -^,q). Now, we have ¥[C{9) - C(9) > C{9) -C{9)<-^]< ¥[{39 € 
T,q<q) \C{9)-C{9)\ > |] = ¥[{39,g £'H,qe Mg),q}) \C{9) - C{9)\ > f]. The last 
equality follows from invoking Lemma |9j 

Note that E[£(0)] = C{9) and that since vr(^) < q < q, the log-likelihood is bounded 
as (Vx) — B < logp(g g)(x) < 0, where B = log + nlog2 = logmax(2m, + nlog2. 

Therefore, by Hoeffding's inequality, we have IP[|£(6') — C{9)\ > |] < 2e . 

Furthermore, note that there are 2d{'H) possible parameters 9, since we need to consider 
only two values of g G {'^{G),q}) and because the number of all possible games in T-l 
(identified by their Nash equilibria sets) is d{7i) = |Ugg-^{A/'<S(^)}|. Therefore, by the union 
bound we get the following uniform convergence IP[(3^, Q ^H^q ^ {■^(^)) 51) l^(^)~^(^)l > 

f] < U{n)¥[\C{9) - C{9)\ > |] < M{n)e ^ = 6. Finally, by solving for 6 we prove our 
claim. □ 

The following theorem establishes the complexity of the class of linear influence games. 



which implies that the term log(i(?^) of the generalization bound in Theorem 10 is only 
polynomial in the number of players n. 



Theorem 11. LefH be the class of linear influence games. Thend['H) = |Ug£-^{AAi?(t/)}| < 

Tifn+1) I -I Q 



Proof. The logarithm of the number of possible pure-strategy Nash equilibria sets supported 
by % (i.e., that can be produced by some game in %) is upper bounded by the VC-dimension 
of the class of neural networks with a single hidden layer of n units and n + (2) input units, 
linear threshold activation functions, and constant output weights. 

For every linear influence game Q = (W,b) in define the neural network with a 
single layer of n hidden units, n of the inputs corresponds to the linear terms xi,...,Xn 
and (2) corresponds to the quadratic polynomial terms XiXj for all pairs of players (i, j), 
^ ^ i < 3 ^ n. For every hidden unit i, the weights corresponding to the linear terms 
xi, . . . ,Xn are —61, . . . , —bn, respectively, while the weights corresponding to the quadratic 
terms for all pairs of players (i, j), 1 < « < j < n, respectively. The weights 

of the bias term of all the hidden units are set to 0. All n output weights are set to 1 
while the weight of the output bias term is set to 0. The output of the neural network 
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is l[x ^ N£{Qy\. Note that we define the neural network to classify non-equilibrium as 
opposed to equilibrium to keep the convention in the neural network literature to define 
the threshold function to output for input 0. The alternative is to redefine the threshold 
function to output 1 instead for input 0. 



Finally, we use the VC-dimension of neural networks ISontag, 1998 . □ 



From Theorems 10 and|ll[ we state the generalization bounds for linear infiuence games. 



Corollary 12. Let = {Q,q) he the maximum likelihood estimate in eg.Q and 6 
{0,q) be the maximum expected likelihood estimate, i.e. 9 = argmaxg^j- C{9) and 6 
argmaxggY>C(^), then with probability at least 1 — 6: 



C{9) > C{9) - (^logmax(2m, + nlog2j {n^ log2 + log |) (11) 

where % is the class of linear influence games and T = {{Q, q) \Q £ Ti /\0 < tt{G) < q < 1} 
is the hypothesis space of non-trivial identifiable linear influence games. 



6 Algorithms 

In this section, we approximate the maximum likelihood problem problem by maximizing 
the number of observed equilibria in the data, suitable for a hypothesis space of games with 
small true proportion of equilibria. We then present our convex loss minimization approach. 
We also discuss baseline methods such as sigmoidal approximation and exhaustive search. 

First, we discuss some negative results that justifies the use of simple approaches. Count- 
ing the number of Nash equilibria is NP-hard for influence games, and so is computing the 
log-likelihood function and therefore maximum likelihood estimation. This is not a disad- 
vantage relative to probabilistic graphical models, since computing the log-likelihood func- 
tion is also NP-hard for Ising models and Markov random fields in general, while learning 
is also NP-hard for Bayesian networks. General approximation techniques such as pseudo- 
likelihood estimation do not lead to tractable methods for learning linear influence games. 
From an optimization perspective, the log-likelihood function is not continuous because of 
the number of equilibria. Therefore, we cannot rely on concepts such as Lipschitz conti- 
nuity. Furthermore, bounding the number of equilibria by known bounds for Ising models 
leads to trivial bounds. (Formal proofs and discussion are included in Appendix [A}) 

6.1 An Exact Quasi-Linear Method for General Games: Sample-Picking 

As a flrst approach, consider solving the maximum likelihood estimation problem in eq.Q 
by an exact exhaustive search algorithm. This algorithm iterates through all possible Nash 
equilibria sets, i.e. for s = 0, . . . , 2^, we generate all possible sets of size s with elements 
from the joint-action space {—1, +1}". Recall that there exists ( ^ ) of such sets of size s 
and since Yl1=o Cs ) ~ search space is super-exponential in the number of players 

n. 

Based on few observations, we can obtain an 0{m log m) algorithm for m samples. First, 
note that the above method does not constrain the set of Nash equilibria in any fashion. 
Therefore, only joint-actions that are observed in the data are candidates of being Nash 
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Algorithm 1 Sample-Picking for General Games 
Input: Dataset V = x^^), . . . ,x(™) 

Compute the unique samples y*-^-*, . . . , y*-^-* and their frequency . . . in the dataset V 
Sort joint-actions by their frequency such that jJ^^^ > p^^^ > ■ ■ ■ > p^^^ 
for each unique sample /c = 1 , . . . , [/ do 

Define Gk by the Nash equilibria set NEiGk) = {y^^', ■ • • , y*''^} 

Compute the log-likelihood C{Gk,<lk) in eq.Q (note that — Tf{G) — ^(p^^'' + • • • 
end for 

Output: The game G% such that k = arg maxj,£(5fc, 



equilibria in order to maximize the log-likelihood. This is because the introduction of an 
unobserved joint-action will increase the true proportion of equilibria without increasing 
the empirical proportion of equilibria and thus leading to a lower log- likelihood in eq.Q. 
Second, given a fixed number of Nash equilibria /c, the best strategy would be to pick the 
k joint-actions that appear more frequently in the observed data. This will maximize the 
empirical proportion of equilibria, which will maximize the log-likelihood. Based on these 
observations, we propose Algorithm [TJ 

As an aside note, the fact that general games do not constrain the set of Nash equi- 
libria, makes the method more likely to over-fit. On the other hand, infiuence games will 
potentially include unobserved equilibria given the linearity constraints in the search space, 
and thus they would be less likely to over- fit. 



6.2 An Exact Super-Exponential Method for Influence Games: Exhaus- 
tive Search 

Note that in the previous subsection, we search in the space of all possible games, not only 
the linear influence games. First note that sample-picking for linear games is NP-hard, 
i.e. at any iteration of sample-picking, checking whether the set of Nash equilibria M£ 
corresponds to an influence game or not is equivalent to the following constraint satisfaction 
problem with linear constraints: 

min 1 
W,b 

s.t. (Vx G Af£) j;i(wi _iTx_i - 6i) > A • • • A x„(w„ -n'^x.n - 6„) > ^^^^ 
(Vx ^ MS) Xi(wi _i'^X_i - 6i) < V • • • V Xn(w„ _„'^x_„ -bn) <0 



Note that eq.(12) contains "or" operators in order to account for the non-equilibria. This 
makes the problem of flnding the (W, b) that satisfles such conditions NP-hard for a non- 
empty complement set {— 1,-|-1}" —N£. Furthermore, since sample-picking only consider 
observed equilibria, the search is not optimal with respect to the space of influence games. 

Regarding a more reflned approach for enumerating influence games only, note that 
in an influence game each player separates hypercube vertices with a linear function, i.e. 
for V = (vi^i_j,6i) and y = (xjX_j, -Xj) G {-1,-hl}" we have Xi(wj„j'^x_j - bi) = v'^y. 
Assume we assign a binary label to each vertex y, then note that not all possible labelings 
are linearly separable. Labelings which are linearly separable are called linear threshold 
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functi ons (LTFs). 



1965 



A lower bound of the number of LTFs was first provided in Muroga 

i0.33048n2 



which showed that the number of LTFs is at least a(n) 



Tighter lower 



bounds were shown later in Yamija and Ibaraki 1965 for n > 6 and in Muroga and Toda 
119661 for n > 8. Regarding an upper bound, Winder! |1960| showed that the number 
of LTFs is at most /3(n) = 2" . By using such bounds for all players, we can conclude 
that there is at least a(n)" = 2'^'^^'''*'^'^ and at most /3(n)" = 2" influence games (which 
is indeed another upper bound of the VC-dimension of the class of influence games; the 
bound in Theorem 11 is tighter and uses bounds of the VC-dimension of neural networks). 
The bounds discussed above would bound the time-complexity of a search algorithm if 
we could easily enumerate all LTFs for a single player. Unfortunately, this seems to be 
far from a trivial problem. By using results in [Muroga 1971| , a weight vector v with 
integer entries such that (Vz) \vi\ < /3(n) = (n -|- l)(""'"-^^/^/2"' is sufficient to realize all 
possible LTFs. Therefore we can conclude that enumerating influence games takes at most 



(2/3(n) + 1) 



/n+l • 



steps, and we propose the use of this method only for n < 4. 



For n = 4 we found that the number of influence games is 23,706. Experimentally, we 
did not find differences between this method and sample-picking since most of the time, the 
model with maximum likelihood was an influence game. 



6.3 From Maximum Likelihood to Maximum Empirical Proportion of 
Equilibria 

We approximately perform maximum likelihood estimation for influence games, by maxi- 
mizing the empirical proportion of equilibria, i.e. the equilibria in the observed data. This 
strategy allows us to avoid computing vr(^) as in eq. ^ for maximum likelihood estima- 
tion (given its dependence on \J\f£{G)\). We propose this approach for games with small 
true proportion of equilibria with high probability, i.e. with probability at least 1 — 6, we 
have 7r{G) < ^ for < k < 1. Particularly, we will show in Section [t] that for influence 
games we have k = 3/4. Given this, our approximate problem relies on a bound of the 
log-likelihood that holds with high probability. We also show that under very mild con- 
ditions, the parameters {G,q) belong to the hypothesis space of the original problem with 
high probability. 

First, we derive bounds on the log- likelihood function. 

Lemma 13. Given a non-trivial game Q with < 7r(^) < t^{Q), the KL divergence in the 
log-likelihood function in eq.^ is hounded as follows: 

-^(g)log7r(g)-log2< A'L(??(g)||7r(g)) < -^{Q)\og^{g) (13) 

Proof. Let vr = 'k{Q) and vr = vr(^). Note that a(7r) = \\m.i^_^Q K L{Tf\\T^) = and /3(vr) = 
lim^f^i i^L(7f llvr) = — logvr < nlog2. Since the function is convex we can upper-bound it 
by a(7r) -|- (/3(7r) — a(7r))7f = —vr logvr. 

To find a lower bound, we find the point in which the derivative of the original function 
is equal to the slope of the upper bound, i.e. ^^^^^W'^) = /3(7r) — a(7r) = — logvr, which 
gives vf* = 23^- Then, the maximum difference between the upper bound and the original 
function is given by lim7r_!.o — vf* logvr — ifL(vf*||vr) = log 2. □ 
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Note that the lower and upper bounds are very informative when '7r{Q) — )• (or in our 
setting when n — )• +c«), since log 2 becomes small when compared to — logn^Q), as shown 
in Figure [2} 

Next, we derive the problem of maximizing the empirical proportion of equilibria from 
the maximum likelihood estimation problem. 

Theorem 14. Assume that with probability at least 1 — 6 we have 7t{Q) < ^ for < k < 1. 
Maximizing a lower bound (with high probability) of the log-likelihood in eg.Q is equivalent 
to maximizing the empirical proportion of equilibria: 

max TT(g) (14) 
gen 

furthermore, for all games Q such that ■k{Q) > 7 for some < 7 < 1/2, for sufficiently large 
n > logf- {6'j) and optimal mixture parameter q = min(7f(^), 1 — 2^), we have {G,q) G T, 
where T = {{Q,q) \ Q £ Ti A < it{G) < q < 1} is the hypothesis space of non-trivial 
identifiable games. 

Proof. By applying the lower bound in Lemma 13 in eq.Q to non-trivial games, we have 
C{g,q) = KL{7r{g)\\TT{g)) - KLmg)\\q) - n log 2 > -?f(g) log7r(g) - KL{9p)\\q) - 
(n + l)log2. Since 7r(a) < ^, we have -log7r(g) > -log^. Therefore C{g,q) > 
-TTig)log^ -KL{n{g)\\q)-{n + l)\og2. Regarding the term i^L(^(g)||g), if ^(g) < 1^ 
KL{7T{g)\\q) = KLmg)\\n{g)) = 0, and if ^(6?) = 1 KL{^{g)\\q) = KL{1\\1 - ^) = 
— log(l — < log 2 and approaches when m — )• +00. Maximizing the lower bound of the 
log-likelihood becomes maxgg-^7f(^) by removing the constant terms that do not depend 

on g. 

In order to prove (^,^) G T we need to prove < vr(^) < q < 1. For proving the first 
inequality < vr(^), note that vf(^) > 7 > 0, and therefore g has at least one equilibria. 
For proving the third inequality q < 1, note that q = min(7f(^), 1 — < 1. For proving 
the second inequality vr(^) < q, we need to prove vr(^) < vf(^) and vr(^) < 1 — Since 
7r(^) < ^ and 7 < 7f(^), it suffices to prove ^^^^^ < 7 =^ ^(^) < ^(^)- Similarly we 
need to prove ^^^^ < ~ 2m ^ ^(^) ^ ~ 2m- Putting both together, we have ^^^^ < 
min(7, 1-^) = 7 since 7 < 1/2 and > 1/2. Finally, ^^^p" < 7 44> n > log^ {6'y). □ 
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6.4 A Non-Concave Maximization Method: Sigmoidal Approximation 

A very simple optimization approach can be devised by using a sigmoid in order to ap- 
proximate the 0/1 function 1 > 0] in the maximum likelihood problem of eq. ([T]) as well as 



when maximizing the empirical proportion of equilibria as in eq.(14). We use the following 
sigmoidal approximation: 

l[z >0]^ H^,i3iz) = 1(1 + tanh(| - arctanh(l - 20^/"))) (15) 

The additional term a ensures that for Q = (W, b), W = 0, b = we get l[x G M£{G)] ~ 
Hafii^)^ = a. We perform gradient ascent on these objective functions that have many 
local maxima. Note that when maximizing the "sigmoidal" likelihood, each step of the 
gradient ascent is NP-hard due to the "sigmoidal" true proportion of equilibria. Therefore, 
we propose the use of the sigmoidal maximum likelihood only for n < 15. 

In our implementation, we add an ^i-norm regularizer — yo||W||i where p > to both 
maximization problems. The £i-norm regularizer encourages sparseness and attempts to 
lower the generalization error by controlling over-fitting. 

6.5 Our Proposed Approach: Convex Loss Minimization 

From an optimization perspective, it is more convenient to minimize a convex objective 
instead of a sigmoidal approximation in order to avoid the many local minima. 

Note that maximizing the empirical proportion of equilibria in eq.(|14|) is equivalent to 
minimizing the empirical proportion of non-equilibria, i.e. mingg-^ (1 — Tf{Q)). Furthermore, 
1 - 9{g) = ^^f£{g)]■ Denote by £ the 0/1 loss, i.e. £{z) = l[z < 0]. For 



influence games, maximizing the empirical proportion of equilibria in eq.(14) is equivalent 
to solving the loss minimization problem: 



1 max£(xfVi,-^^x5 - h)) (16) 
We can further relax this problem by introducing convex upper bounds of the 0/1 



mm 

w,b m ^ 



loss. Note that the use of convex losses also avoids the trivial solution of eq.(16), i.e. 
W = 0, b = (which obtains the lowest log-likelihood as discussed in Remark|8]). Intuitively 
speaking, note that minimizing the logistic loss i{z) = log(l -|- e~^) will make z — )• -|-oo, 
while minimizing the hinge loss i{z) = max(0, 1 — z) will make z — t- 1 unlike the 0/1 loss 
i{z) = l[z < 0] that only requires z = in order to be minimized. In what follows, we 



develop four efficient methods for solving eq.(16) under specific choices of loss functions, i.e. 
hinge and logistic. 

In our implementation, we add an £i-norm regularizer />||W||i where p > to all the 
minimization problems. The £i-norm regularizer encourages sparseness and attempts to 
lower the generalization error by controlling over-fitting. 

Independent Support Vector Machines and Logistic Regression. We can relax 



the loss minimization problem in eq.(16) by using the loose bound maxj£(zj) < ^^i{zi). 
This relaxation simplifies the original problem into several independent problems. For 
each player i, we train the weights (wj^_j,6j) in order to predict independent (disjoint) 
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actions. This leads to 1-norm SVMs of Bradley and Mangasarian [1998 , Zhu et al. 2003] 
and £i-regularized logistic regression. We solve the latter with the ii-projection method 
of Schmidt et al. |2007a . While the training is independent, our goal is not the prediction for 



independent players but the characterization of joint-actions. The use of these well known 
techniques in our context is novel, since we interpret the output of SVMs and logistic 
regression as the parameters of an influence game. Therefore, we use the parameters to 
measure empirical and true proportion of equilibria, KL divergence and log-likelihood in 
our probabilistic model. 



Simultaneous Support Vector Machines. While converting the loss minimization 



problem in eq.(16) by using loose bounds allow to obtain several independent problems with 
small number of variables, a second reasonable strategy would be to use tighter bounds at 
the expense of obtaining a single optimization problem with a higher number of variables. 

For the hinge loss £{z) = max (0, 1 — z), we have maxj i{zi) = max (0, 1 — 2:1, . . . , 1 — Zn) 
and the loss minimization problem in eq.(|16|) becomes the following primal linear program: 



min — y^6 + /3||W||i 

s.t. (V/,i) (wi,_iTx2 -bi)>l-Ci , (VO > 

where p > 0. 

Note that eq.(17) is equivalent to a linear program since we can set W = W+ — W~, 
II W 111 = Yliij'^tj + ^ij' ^'^'^ constraints W"^ > and > 0. We follow the 
regular SVM derivation by adding slack variables for each sample I. This problem is a 



generalization of 1-norm SVMs of Bradley and Mangasarian 11998 , Zhu et al. 12003 



By Lagrangian duality, the dual of the problem in eq.(17) is the following linear program: 
max > an 

a ^ — ' 

li 

S.t. (Vi) II Ez"««2;f^x^!lloo < P , (V/,i) an > 

(Vi) Ei<^Hxf^ = , (V/) Ei»u<7^ 



Furthermore, strong duality holds in this case. Note that eq.(|18[) is equivalent to a linear 
program since we can transform the constraint ||c||oo < P into —pi < c < pi. 

Simultaneous Logistic Regression. For the logistic loss i{z) = log(l -|- e~^), we could 
use the non-smooth loss maxj£(zj) directly. Instead, we chose a smooth upper bound, i.e. 
log(l -|- e~^') (Discussion is included in Appendixpl) The loss minimization problem in 



eq.(16) becomes: 

^Elog(l + E.e~"''^"''-^"''-'-''^) + ^l|W||i (19) 



w.b m 

I 



where p > 0. 
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In our implementation, we use the li-projection method of Schmidt et al. 2007a for 



optimizing eq.(19). This method performs a limited memory Broyden-Fletcher-Goldfarb 
Shanno (L-BFGS) step in an expanded model (i.e. W = W"*"— W~, || W||i = J2ij '^tj + ^ 



followed by a projection onto the non- negative orthant to enforce W+ > and W 



> 0. 



7 True Proportion of Equilibria 



In this section, we justify the use of convex loss minimization for learning the structure 
and parameters of influence games. We define absolute indifference of players and show 
that our convex loss minimization approach produces games in which all players are non- 
absolutely-indifferent. We then provide a bound of the true proportion of equilibria with 
high probability. Our bound only assumes independence of weight vectors among players. 
Our bound is distribution-free, i.e. we do not assume a specific distribution for the weight 
vector of each player. Furthermore, we do not assume any connectivity properties of the 
underlying graph. 



Parallel to our analysis, Daskalakis et al. 12011 analyzed a different setting: random 



games which structure is drawn from the Erdos-Renyi model (i.e. each edge is present 
independently with the same probability p) and utility functions which are random tables. 



The analysis in Daskalakis et al. 2011] , while more general than ours (which only focus on 
influence games), it is at the same time more restricted since it assumes either the Erdos- 
Renyi model for random structures or connectivity properties for deterministic structures. 



7.1 Convex Loss Minimization Produces Non- Absolutely-Indifferent Play- 
ers 

First, we define the notion of absolute indifference of players. Our goal in this subsection 
is to show that our proposed convex loss algorithms produce infiuence games in which all 
players are non-absolutely-indifferent and therefore every player defines constraints to the 
true proportion of equilibria. 

Definition 15. Given an influence game Q = (W,b), we say a player i is absolutely 
indifferent if and only if {\Vi-i^bi) = 0, and non- absolutely-indifferent if and only if 
{wi^^i,bi) / 0. 

Next, we concentrate on the first ingredient for our bound of the true proportion of 
equilibria. We show that independent and simultaneous SVM and logistic regression pro- 
duce games in which all players are non-absolutely-indifferent except for some "degenerate" 
cases. The following lemma applies to independent SVMs for c^'^ = and simultaneous 
SVMs for c(') = max(0,maxjyi (1 - x'l\wi^_i^x.'L\ - k))). 

Lemma 16. Given (V/) c*-'^ > 0, the minimization of the hinge training loss i{'Wi-i,bi) = 
m Si max(c*^'), 1 — xf\wi-i^x.^_[\ — bi)) guarantees non-absolutely-indifference of player i 
except for some "degenerate" cases, i.e. the optimal solution (w*__^,6*) = if and only if 

(Vi / i) El l[xf^xf=l]u(^^ = El lM'^4'^=-l]u(') andEi l[xf^=l]n« = Ei l[xf'^=-l]u^^^ 
where u^^'> is defined as c^'^ > 1 ^ n^'^ = 0, c^'^ < 1 u^'^ = 1 and c^'^ = 1 <^ u^'-^ G [0; 1]. 
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Proof. Let /i(x_j) = Wj^_j'^x_j — 6j. By noting that max(a, /3) = maxo<u<i (a + n(/3 — a)), 
we can rewrite ?(wi = E« maxQ<„(o<i (c(') + ^('^(1 - - c^'^))- 

Note that £ has the minimizer {w* b*) = if and only if belongs to the subdifferential 
set of the non-smooth function i at (wj^_j,6j) = 0. In order to maximize i, we have 
c(0 > 1 _ xfVi(x5) ^ ^^'^ = 0, c(0 < 1 - xfVi(x5) ^ ^^^'^ = 1 and c^) = 1 - 



X: 



^'Vi(x3) ^ ^^^'^ e [0;!]. The previous rules simplify at the solution under analysis, since 



(wi,_„6i) = O^/,(x5) = 0. 

Let gj{wi^^i,bi) = ^{wi_i,bi) and h{wi^^i,bi) = ^^{wi^^i,bi). By making (Vj / 

i) G 5j(0,0) and G /i(0,0), we get (Vj / X^^xfxj'^nW = and E/^;? ^^^'^ = 0. 

Finally, by noting that xf^ G 1}) we prove our claim. □ 



Remark 17. Note that for independent SVMs, the "degenerate" cases in Lemma 16 simplify 
to (Vj / i) El M'^^f = 1] = f <^rid Zi = 1] = f • 

The following lemma applies to independent logistic regression for c^'^ = and simulta- 
neous logistic regression for c"-* = E^y^ e y^^-i ^-i 

Lemma 18. Given {\fl) c*-'-* > 0, the minimization of the logistic training loss £{wi^-i,bi) = 

— Ej log(c^^ + 1 + e~^» ''-i" guarantees non-absolutely-indifference of player i ex- 

cept for some "degenerate" cases, i.e. the optimal solution {w*_-,b*) = if and only if 



Proof. Note that £ has the minimizer (w*_.,6*) = if and only if the gradient of the 

dl 
dwi 



smooth function ^ is at (wj^_j, bi) = 0. Let gj(yvi-i, bi) = Q^(wj._j, 6j) and /i(wj^_j, 6^ 



g(wi,_i, By making (Vj / i) g, (0, 0) = and /i(0, 0) = 0, we get (Vj / i) = 

2.(0 /n 

and Yli ~ ^' ■f'^ally, by noting that G { — 1> 1}, we prove our claim. □ 

Remark 19. Note that for independent logistic regression, the "degenerate" cases in Lemma 
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simplify to (Vj / i) Y.I ^xf^xf = 1] = f and Y.i ^[x? = 1] = f- 



Based on these results, after termination of our proposed algorithms, we fix cases in 
which the optimal solution (w*_^,6*) = by setting = 1 if the action of player i was 
mostly — 1 or 6* = — 1 otherwise. We point out to the careful reader that we did not include 
the £i-regularization term in the above proofs since the subdifferential of p||wj^_j||i vanishes 
at y^i-i = 0, and therefore our proofs still hold. 

7.2 Bounding the True Proportion of Equilibria 

In what follows, we concentrate on the second ingredient for our bound of the true proportion 
of equilibria. We show that for a game with a single non- absolutely-indifferent player, the 
true proportion of equilibria is bounded by 3/4. 
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Lemma 20. Given an influence game Q = (W, b) with non-ahsolutely-indifferent player i 
and absolutely-indifferent players Vj 7^ i, the following statements hold: 

i. xeAA£:(g)4^Xi(w,,_iTx_^-&i) >0 

ii. |AA£:(g)| = 2"-i + Ex_,lK-^Tx_i-6, = 0] (20) 

iii. \ < vr(g) < f 

Proof. Let fi{^-i) = Wj^_j'^x_j — 6j. For proving Claim i, note that l[x G A/'iS(^)] = 
minj l[xj/j(x_j) > 0] = l[xj/j(x_j) > 0] minj^j l[xj/j(x_j) > 0]. Since all players except i 
are absolutely-indifferent, we tiave (Vj / i) (wj^_j,6j) = =^ fj{x^j) = wtiich implies 
that min^yj l[xj fj{x^j) > 0] = 1. Therefore, l[x € M£{G)] = l[xifi{x^i) > 0]. 

For proving Claim ii, by Claim i we have \M£{G)\ = J2x^i^ifi(^-i) — 
rewrite \Ar£{g)\ = = +l]l[/*(x_i) > 0] + E^l[xi = -l]l[/i(x_,) < 0] or equiva- 

lently \M£{g)\ = Ex_, l[/i(x-.) > 0] + Ex_, m^-^) < 0] = 2"-i + Ex_, nf^{^-^) = 0]. 

For proving Claim iii, by eq.Q and Claim ii we have 7^(0) = ^"^y,^^^ = ^ + ^Oi{wi-i, hi), 
where a(wj^_j,6j) = YliyL_ ^[^i-i^^-i — h = 0]. This proves the lower bound tt{Q) > ^. 
Geometrically speaking, a{wi-i,bi) is the number of vertices of the (n — l)-dimensional 
hypercube that are covered by the hyperplane with normal Wi-i and bias hi. Recall that 
(wi,_i, bi) 7^ 0. If Wi,_i = and 5^/0 then a(w^,_^, bj) = J2^^, \ [bi = 0] = ^ 7r{g) = i. 
If Wj _j 7^ then as noted in Aichholzer and Aurenhammer [1996| a hyperplane with n — 2 



zeros on Wj^_j (i.e. a (n — 2)-parallel hyperplane) covers exactly half of the 2" vertices, 

i + ^a(wi _i, bi) + ^ = |. 



the maximum possible. Therefore, vr(^) = ^ + i^a{'Wi^-i,bi) < ^ + ^^^^^ ~ |- '-' 



Next, we present our bound for the true proportion of equilibria of games in which all 
players are non- absolutely- indifferent. 

Theorem 21. If all players are non- absolutely-indifferent and if the rows of an influence 
game Q = (W, b) are independent (hut not necessarily identically distributed) random vec- 
tors, i.e. for every player i, (wj^_j,6j) is independently drawn from an arbitrary distribution 
Vi, then the expected true proportion of equilibria is bounded as follows: 

(l/2r<Ep„...,p„Kg)]<(3/4r (21) 

furthermore, the following high probability statement holds: 

Pp„...,pjvr(g) < i^ZlT] > 1 - 5 (22) 

Proof. Let yi = l[xj(wj._j'^x_j — 6j) >0], V = {Vi, . . . ,Vn] and U the uniform distri- 
bution for X G {-1,+!}'^. By eq.Q, ^vW)] = ^vl^H^WiVi] = ^vl^ulYi^Vi]] = 
IE2^[E-p[]^- Note that each yi is independent since each (wj^_j,6j) is independently 
distributed. Therefore, Ep[7r(^)] = Ef^[]^- Ep- [y^]]. Similarly each Zi = E-p. [y,] is indepen- 
dent since each (wj^_j,6j) is independently distributed. Therefore, Ep[7r(^)] = Ei^[f|j2;j] = 
Hi % [zi] = Wi Ew [Ep^ [yi]] = Hi lEp- [Et/ [yi]] . Note that E^ [yi] is the true proportion of equi- 
libria of an influence game with non-absolutely-indifferent player i and absolutely-indifferent 



players Vj 7^ i, and therefore 1/2 < EfY[yj] < 3/4 by Claim iii of Lemma 20 Finally, we 
have Ep[7r(g)] > Y[.¥.vA^/'A = (1/2)" and similarly Ep[7r(g)] < W.^vA^/^] = (3/4)". 

By Markov's inequality, given that 7r(a) > 0, we have Pp[7r(a) > c] < !5:pMM < (s/lT^ 
Yorc=^^^¥T.[^{g)>^^]<5^¥r[7^{g)<^^]>l-5. ' ' □ 
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Remark 22. Under the same assumptions of Theorem \21\ it is possible to prove that with 
probability at least \ — 5 we have it{Q) < (3/4)" + 3/8w21og j by using Hoeffding's lemma. 
We point out that such a bound is not better than the Markov's bound derived above. 



8 Experimental Results 



For learning influence games we used our convex loss methods: independent and simultane- 
ous SVM and logistic regression. Additionally, we used the (super-exponential) exhaustive 
search method only for n < A. As a baseline, we used the sigmoidal maximum likelihood 
(NP-hard) only for n < 15 as well as the sigmoidal maximum empirical proportion of 



equilibria. Regarding the parameters a and /3 our sigmoidal function in eq.(15), we found 
experimentally that a = 0.1 and (3 = 0.001 achieved the best results. 

We compare learning influence games to learning Ising models. For n < 15 players, 
we perform exact £i-regularized maximum likelihood estimation by using the FOBOS al- 
gorithm Duchi and Singer 2009a|b and exact gradients of the log-likelihood of the Ising 
model. Since the computation of the exact gradient at each step is NP-hard, we used this 



method only for n < 15. For n > 15 players, we use the Hofling-Tibshirani method Hofling 



and Tibshirani 2009 , which uses a sequence of first-order approximations of the exact 
log-likelihood. We also used a two-step algorithm, by first learning the structure by ii- 
regularized logistic regression Wainwright et al. , 2006 and then using the FOBOS algo- 



rithm [Duchi and Singer 2009a|b with belief propagation for gradient approximation. We 



did not find a statistically significant difference between the test log-likelihood of both 
algorithms and therefore we only report the latter. 

Our experimental setup is as follows: after learning a model for different values of the 
regularization parameter p in a training set, we select the value of p that maximizes the log- 
likelihood in a validation set, and report statistics in a test set. For synthetic experiments, 
we report the Kullback-Leibler (KL) divergence, average precision (one minus the fraction 
of falsely included equilibria), average recall (one minus the fraction of falsely excluded 
equilibria) in order to measure the closeness of the recovered models to the ground truth. 
For real-world experiments, we report the log-likelihood. In both synthetic and real-world 
experiments, we report the number of equilibria and the empirical proportion of equilibria. 

We first test the ability of the proposed methods to recover the ground truth structure 
from data. We use a small first synthetic model in order to compare with the (super- 
exponential) exhaustive search method. The ground truth model Gg = (Wg,bg) has n = 4 
players and 4 Nash equilibria (i.e. Tr(Qg)=0.25), was set according to Figure [s] (the 
weight of each edge was set to +1) and = 0. The mixture parameter of the ground truth 
Qg was set to 0.5,0.7,0.9. For each of 50 repetitions, we generated a training, a validation 
and a test set of 50 samples each. Figure [3] shows that our convex loss methods and sig- 
moidal maximum likelihood outperform (lower KL) exhaustive search, sigmoidal maximum 
empirical proportion of equilibria and Ising models. Note that the exhaustive search method 
which performs exact maximum likelihood suffers from over-fitting and consequently does 
not produce the lowest KL. From all convex loss methods, simultaneous logistic regression 
achieves the lowest KL. For all methods, the recovery of equilibria is perfect for qg = 0.9 
(number of equilibria equal to the ground truth, equilibrium precision and recall equal to 
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Figure 3: Closeness of the recovered models to the ground truth synthetic model for different mixture 
parameters qg. Our convex loss methods (IS,SS: independent and simultaneous SVM, IL,SL: independent 
and simultaneous logistic regression) and sigmoidal maximum likelihood (SI) have lower KL than exhaustive 
search (EX), sigmoidal maximum empirical proportion of equilibria (S2) and Ising models (IM). For all 
methods, the recovery of equilibria is perfect for qg = 0.9 (number of equilibria equal to the ground truth, 
equilibrium precision and recall equal to 1) and the empirical proportion of equilibria resembles the mixture 
parameter of the ground truth qg. 



1). Additionally, the empirical proportion of equilibria resembles the mixture parameter of 
the ground truth qg. 

Next, we use a relatively larger second synthetic model with more complex interactions. 
We still keep the model small enough in order to compare with the (NP-hard) sigmoidal 
maximum likelihood method. The ground truth model Qg = (Wg,hg) has n = 9 players 
and 16 Nash equilibria (i.e. 7r(^g)=0. 03125), was set according to Figure |4] (the weight 
of each blue and red edge was set to +1 and —1 respectively) and hg = 0. The mixture 
parameter of the ground truth qg was set to 0.5,0.7,0.9. For each of 50 repetitions, we 
generated a training, a validation and a test set of 50 samples each. Figure |4] shows that 
our convex loss methods outperform (lower KL) sigmoidal methods and Ising models. From 
all convex loss methods, simultaneous logistic regression achieves the lowest KL. For convex 
loss methods, the equilibrium recovery is better than the remaining methods (number of 
equilibria equal to the ground truth, higher equilibrium precision and recall). Additionally, 
the empirical proportion of equilibria resembles the mixture parameter of the ground truth 

Qg- 

In the next experiment, we show that the performance of convex loss minimization 
improves as the number of samples increases. We used random graphs with slightly more 
variables and varying number of samples (10,30,100,300). The ground truth model Gg = 
(Wg,hg) contains n = 20 players. For each of 20 repetitions, we generate edges in the 
ground truth model with a required density (either 0.2,0.5,0.8). For simplicity, the 
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Figure 4: Closeness of the recovered models to the ground truth synthetic model for different mixture 
parameters qg. Our convex loss methods (IS,SS: independent and simultaneous SVM, IL,SL: independent 
and simultaneous logistic regression) have lower KL than sigmoidal maximum likelihood (SI), sigmoidal 
maximum empirical proportion of equilibria (S2) and Ising models (IM). For convex loss methods, the 
equilibrium recovery is better than the remaining methods (number of equilibria equal to the ground truth, 
higher equilibrium precision and recall) and the empirical proportion of equilibria resembles the mixture 
parameter of the ground truth qg. 



weight of each edge is set to +1 with probabihty P(+l) and to —1 with probabihty 1— P(+l). 
Hence, the Nash equihbria of the generated games does not depend on the magnitude of the 
weights, just on their sign. We set the bias = and the mixture parameter of the ground 
truth Qg = 0.7. We then generated a training and a validation set with the same number 
of samples. Figure [s] shows that our convex loss methods outperform (lower KL) sigmoidal 
maximum empirical proportion of equilibria and Ising models (except for the synthetic 
model with high true proportion of equilibria: density 0.8, P(+l) = 0, NE> 1000). The 
results are remarkably better when the number of equilibria in the ground truth model is 
small (e.g. for NE< 20). From all convex loss methods, simultaneous logistic regression 
achieves the lowest KL. 

In the next experiment, we evaluate two effects in our approximation methods. First, 
we evaluate the impact of removing the true proportion of equilibria from our objective 
function, i.e. the use of maximum empirical proportion of equilibria instead of maximum 
likelihood. Second, we evaluate the impact of using convex losses instead of a sigmoidal 
approximation of the 0/1 loss. We used random graphs with varying number of players and 
50 samples. The ground truth model Qg = (W^, b^) contains n = 4,6, 8, 10, 12 players. For 
each of 20 repetitions, we generate edges in the ground truth model Wg with a required 
density (either 0.2,0.5,0.8). As in the previous experiment, the weight of each edge is set to 
+1 with probability P(+l) and to —1 with probability 1 — P(+l). We set the bias hg = 
and the mixture parameter of the ground truth qg = 0.7. We then generated a training and a 
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Figure 5: KL divergence between the recovered models and the ground truth for datasets of different number 
of samples. Each chart shows the density of the ground truth, probability P(+l) that an edge has weight 
+1, and average number of equilibria (NE). Our convex loss methods (IS,SS: independent and simultaneous 
SVM, IL,SL: independent and simultaneous logistic regression) have lower KL than sigmoidal maximum 
empirical proportion of equilibria (S2) and Ising models (IM). The results are remarkably better when the 
number of equilibria in the ground truth model is small (e.g. for NE< 20). 



validation set with the same number of samples. Figure [6] shows that in general, convex loss 
methods outperform (lower KL) sigmoidal maximum empirical proportion of equilibria, and 
the latter one outperforms sigmoidal maximum likelihood. A different effect is observed for 
mild (0.5) to high (0.8) density and P(+l) = 1 in which the sigmoidal maximum likelihood 
obtains the lowest KL. In a closer inspection, we found that the ground truth games usually 
have only 2 equilibria: (+1, . . . , +1) and (—1, . . . , —1), which seems to present a challenge 
for convex loss methods. It seems that for these specific cases, removing the true proportion 
of equilibria from the objective function negatively impacts the estimation process, but note 
that sigmoidal maximum likelihood is not computationally feasible for n > 15. 

We used the U.S. congressional voting records in order to measure the generalization 
performance of convex loss minimization in a real- world dataset. The dataset is publicly 
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Figure 6: KL divergence between the recovered models and the ground truth for datasets of different number 
of players. Each chart shows the density of the ground truth, probability P(+l) that an edge has weight 
+1, and average number of equilibria (NE) for n = 2;n = 14. In general, simultaneous logistic regression 
(SL) has lower KL than sigmoidal maximum empirical proportion of equilibria (S2), and the latter one has 
lower KL than sigmoidal maximum likelihood (SI). Other convex losses behave the same as simultaneous 
logistic regression (omitted for clarity of presentation). 



available at http: //www. senate. gov/. We used the first session of the 104th congress (Jan 
1995 to Jan 1996, 613 votes), the first session of the 107th congress (Jan 2001 to Dec 2001, 
380 votes) and the second session of the 110th congress (Jan 2008 to Jan 2009, 215 votes). 



Following on other researchers who have experimented with this data set (e.g. Banerjee et al. 



[2008] ), abstentions were replaced with negative votes. Since reporting the log-likelihood 
requires computing the number of equilibria (which is NP-hard) , we selected only 20 senators 
by stratified random sampling. We randomly split the data into three parts. We performed 
six repetitions by making each third of the data take turns as training, validation and 
testing sets. Figure [7] shows that our convex loss methods outperform (higher log-likelihood) 
sigmoidal maximum empirical proportion of equilibria and Ising models. From all convex 
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Figure 7: Statistics for games learnt from 20 senators from the first session of the 104th congress, first 
session of the 107th congress and second session of the 110th congress. The log-likelihood of our convex 
loss methods (IS,SS: independent and simultaneous SVM, IL,SL: independent and simultaneous logistic 
regression) is higher than sigmoidal maximum empirical proportion of equilibria (S2) and Ising models (IM). 
For all methods, the number of equilibria (and so the true proportion of equilibria) is low. 

loss methods, simultaneous logistic regression achieves the lowest KL. For all methods, the 
number of equilibria (and so the true proportion of equilibria) is low. 

We apply convex loss minimization to larger problems, by learning structures of games 
from all 100 senators. Figure [8] shows that simultaneous logistic regression produce struc- 
tures that are sparser than its independent counterpart. The simultaneous method better 
elicits the bipartisan structure of the congress. We define the influence of player j to all other 
players as Yli l^ijl after normalizing all weights, i.e. for each player i we divide (wj^_j,&j) 
by ||wi^_j||i + \ bi\. Note that Jeffords and Clinton are one of the 5 most directly-influential 
as well as 5 least directly-influenceable (high bias) senators, in the 107th and 110th congress 
respectively. McCain and Feingold are both in the list of 5 most directly-infiuential senators 
in the 104th and 107th congress. McCain appears again in the list of 5 least influenciable 
senators in the 110th congress. 

We test the hypothesis that influence between senators of the same party are stronger 
than senators of different party. We learn structures of games from all 100 senators from the 
101th congress to the 111th congress (Jan 1989 to Dec 2010). The number of votes casted 
for each session were average: 337, minimum: 215, maximum: 613. Figure [9] validates 
our hypothesis and more interestingly, it shows that influence between different parties is 
decreasing over time. Note that the influence from Obama to Republicans increased in the 
last sessions, while McCain's influence to Republicans decreased. 



9 Discussion 



It is important to point out that our work is not in competition with the work in probabilistic 
graphical models, e.g. Ising models. Our goal is to learn the structure and parameters of 
games from data, and for this end, we propose a probabilistic model that is inspired by the 
concept of equilibrium in game theory. While we illustrate the beneflt of our model in the 
U.S. congressional voting records, we believe that each model has its own beneflts. If the 
practitioner "believes" that the data at hand is generated by a class of models, then the 
interpretation of the learnt model allows obtaining insight of the problem at hand. Note 
that none of the existing models (including ours) can be validated as the ground truth 
model that generated the real-world data, or as being more or less "realistic" with respect 



24 




Figure 8: Matrix of influence weigfits for games learnt from all 100 senators, from the first session of the 
104th congress (left), first session of the 107th congress (center) and second session of the 110th congress 
(right), by using our independent (a) and simultaneous (b) logistic regression methods. A row represents 
how every other senator influence the senator in such row. Positive influences are shown in blue, negative 
influences are shown in red. Democrats are shown in the top/left corner, while Republicans are shown 
in the bottom/right corner. Note that simultaneous method produce structures that are sparser than its 
independent counterpart. Partial view of the graph for simultaneous logistic regression (c). Most directly- 
influential (d) and least directly-influenceable (e) senators. Regularization parameter p = 0.0006. 

to other model. While generalization in unseen data is a very important measurement, a 
model with better generalization is not the "ground truth model" of the real-world data at 
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Figure 9: Direct influence between parties and influences from Obama and McCain. Games were learnt 
from all 100 senators from the 101th congress (Jan 1989) to the 111th congress (Dec 2010) by using our 
simultaneous logistic regression method. Direct influence between senators of the same party are stronger 
than senators of different party, which is also decreasing over time. In the last sessions, influence from Obama 
to Republicans increased, and influence from McCain to both parties decreased. Regularization parameter 
p = 0.0006. 



hand. Finally, while our model is simple, it is well founded and we show that it is far from 
being computationally trivial. Therefore, we believe it has its own right to be analyzed. 

The special class of graphical games considered here is related to the well-known linear 
threshold model (LTM) in sociology [Granovetter 1978 , recently very popular within the 
social network and theoretical computer science community Kleinberg, 2007] . LTMs are 
usually studied as the basis for some kind of diffusion process. A typical problem is the 
identification of most influential individuals in a social network. An LTM is not in itself a 
game-theoretic model and, in fact, Granovetter himself argues against this view in the con- 



text of the setting and the type of questions in which he was most interested Granovetter 



1978 . To the best of our knowledge, subsequent work on LTMs has not taken a strictly 
game-theoretic view either. Our model is also related to a particular model of discrete 
choice with social interactions in econometrics (see, e.g. [Brock and Durlauf] |2001| ). The 
main difference is that we take a strictly non-cooperative game-theoretic approach within 
the classical "static" /one-shot game framework and do not use a random utility model. In 
addition, we do not make the assumption of rational expectations, which is equivalent to 
assuming that all players use exactly the same mixed strategy. As an aside note, regarding 
learning of information diffusion models over social networks, Saito et al. , 2010 considers 



a dynamic (continuous time) LTM that has only positive influence weights and a randomly 
generated threshold value. 

There is still quite a bit of debate as to the appropriateness of game-theoretic equilib- 
rium concepts to model individual human behavior in a social context. Camerer's book on 
behavioral game theory [Camerer 2003 addresses some of the issues. We point out that 
there is a broader view of behavioral data, beyond those generated by individual human 
behavior (e.g. institutions such as nations and industries, or engineered systems such as 
autonomous-response devices in residential or commercial properties that are programmed 
to control electricity usage based on user preferences). Our interpretation of Camerer's 
position is not that Nash equilibiria is universally a bad predictor but that it is not consis- 
tently the best, for reasons that are still not well understood. This point is best illustrated 
in Chapter 3, Figure 3.1 of Camerer [2003 . Quantal response equilibria (QRE) has been 



proposed as an alternative to Nash in the context of behavioral game theory. Models based 
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on QRE have been shown superior during initial play in some experimental settings, but 
most experimental work assume that the game's payoff matrices are known and only the 
"precision parameter" is estimated, e.g. Wright and Leyton-Brown 2010] . Finally, most of 
the human-subject experiments in behavioral game theory involve only a handful of players, 
and the scalability of those results to games with many players is unclear. 

In this work we considered pure-strategy Nash equilibria only. Note that the universality 
of mixed-strategy Nash equilibria does not diminish the importance of pure-strategy equilib- 
ria in game theory. Indeed, a debate still exist within the game theory community as to the 
justification for randomization, specially in human contexts. We decided to ignore mixed- 
strategies due to the significant added complexity. Note that we learn exclusively from 
observed joint-actions, and therefore we cannot assume knowledge of the internal mixed- 
strategies of players. We could generalize our model to allow for mixed-strategies by defining 
a process in which a joint mixed strategy V from the set of mixed-strategy Nash equilib- 
rium (or its complement) is drawn according to some distribution, then a (pure-strategy) 
realization x is drawn from V that would correspond to the observed joint-actions. 

In this paper we considered a "global" noise process, which is governed with a probability 
q of selecting an equilibrium. Potentially better and more natural "local" noise processes are 
possible, at the expense of producing a significantly more complex generative model than 
the one considered in this paper. For instance, we could use a noise process that is formed of 
many independent, individual noise processes, one for each player. As an example, consider 
a the generative model in which we first select an equilibrium x of the game and then each 
player i, independently, acts according to Xi with probability qi and switches its action 
with probability 1 — qi. The problem with such a model is that it leads to a significantly 
more complex expression for the generative model and thus likelihood functions. This is 
in contrast to the simplicity afforded us by the generative model with a more global noise 
process defined above. 



10 Concluding Remarks 

There are several ways of extending this research. Different upper bounds for the 0/1 
loss (e.g. exponential, smooth hinge) as well as £2-iiorm regularizers need to be analyzed. 
Learning structures in settings in which players can take more than two possible actions or 
follow non-linear (e.g. kernelized) strategies, need to be investigated. More sophisticated 
noise processes as well as mixed-strategy Nash equilibria need to be considered and studied. 
Finally, topic-specific and time-varying versions of our model would elicit differences in 
preferences and trends. 
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A Negative Results 

A.l Counting the Number of Equilibria is NP-hard 

Here we provide a proof that establishes NP-hardness of counting the number of Nash 
equilibria, and thus also of evaluating the log-likelihood function for our generative model. 



A T^tP-hardness proof was originally provided by Irfan and Ortiz 2011 , here we present a 
related proof for completeness. The reduction is from the set partition problem for a specific 
instance of a single non- absolutely-indifferent player. 

Recall the set partition problem: given a multiset of n positive numbers {ai, . . . ,an}, 
SetPartition(a) answers "yes" if and only if it is possible to partition the numbers into two 
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disjoint subsets Si and 52 such that Si Ci S2 = 0, 5i U ^2 = {1, . . . , n} and J^ies^ ~ 
Z^ig52 fli = 0; otherwise it answers "no". The set partition problem is equivalent to the 
subset sum problem, in which given a set of positive numbers {ai, . . . , On} and a target sum 
c > 0, SubSetSum(a, c) answers "yes" if and only if there is a subset S C {1, . . . ,n} such 
that Ylies ^« ~ otherwise it answers "no" . The equivalence between set partition and 
subset sum follows from SetPartition(a) = SubSetSum(a, ^ X^jflj)- 

For clarity of exposition, we drop the subindices in the following lemma. Let w = 
Wi _i G M"-i and b=bi£R. 



Lemma 23. The problem of counting Nash equilibria considered in Claim ii of Lemma 20 
reduces to the set partition problem. More specifically, given {\/i) wi > 0, 6 = 0, answering 
whether l[w'^x — 6 = 0] > is equivalent to answering SetPartition(w) . 

Proof. Let 5i(x) = {i\xi = +1} and 52(x) = {i\xi = —1}. We can rewrite l[vif'^x — 6 = 0] 
as a sum of set partition conditions, i.e. l[Z]ie5i(x) '^i ~ X]je52{x) = 0]. Therefore, if 
no tuple X fulfills the condition, the sum is zero and SetPartition(w) answers "no". On the 
other hand, if at least one tuple x fulfills the condition, the sum is greater than zero and 
SetPartition(w) answers "yes". □ 



A. 2 Computing the Pseudo-Likelihood is NP-hard 



We show that evaluating the pseudo-likelihood function for our generative model is NP- 
hard. First, consider a non-trivial influence game Q in which eq.Q simplifies to p(g q)(x) = 

'i^^^W^WW^ + (1 — q) 2i^|j\/g(g)| • Furthermore, assume the game Q = (W, b) has a single 
non-absolutely-indifferent player i and absolutely-indifferent players \/j ^ i. Let fi{x-i) 

we have l[KeAf£{g)] 

l-l[x,/,(x_,)>0] 

2"-\Afeig)\ ■ 



w 



-x_j — 6j. By Claim i of Lemma 
therefore P[g^q) (x) 



l[xifi{x^,)>0] 
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l[Xifi{x 

Finally, by Lemma 
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> 0] and 
computing 



\N'£{g)\ is NP-hard even for this specific instance of a single non- absolutely-indifferent 
player. 



A. 3 Counting the Number of Equihbria is not (Lipschitz) Continuous 

We show that small changes in the parameters Q = (W, b) can produce big changes 
in \M£{Q)\. For instance, consider two games Qk = (Wfc,bfc), where Wi = 0,bi = 
0,\M£{gi)\ = 2" and W2 = £(11^ - I),b2 = Q,\M£{g2)\ = 2 for e > 0. For e ^ 0, 
any ^p-norm ||Wi - W2||p but \M£{Qi)\ - |7V£^(^2)| = 2*^-2 remains constant. 



A. 4 The Log-Partition Function of an Ising Model is a Trivial Bound for 
Counting the Number of Equilibria 

Let /i(x_,) = Wi,_,Tx_i - 6i, W£{G)\ = Ex Hi > 0] < Ex = 

Ex ^ ~ ^(^(^ + W^),b), where Z denotes the partition function of an Ising 

model. Given convexity of Z [Koller and Friedman 2009| and that the gradient vanishes 



at W = 0, b = 0, we know that Z(^(W -h W'^'), b) > 2", which is the maximum \M£{G)\- 
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B Simultaneous Logistic Loss 



Given that any loss £{z) is a decreasing function, the following identity holds maxii{zi) = 
^(minj Zi). Hence, we can either upper-bound the max function by the logsumexp function 
or lower-bound the min function by a negative logsumexp. We chose the latter option for the 
logistic loss for the following reasons: Claim i of the following technical lemma shows that 
lower-bounding min generates a loss that is strictly less than upper-bounding max. Claim 
ii shows that lower-bounding min generates a loss that is strictly less than independently 
penalizing each player. Claim iii shows that there are some cases in which upper-bounding 
max generates a loss that is strictly greater than independently penalizing each player. 

Lemma 24. For the logistic loss i{z) = log(l+e^^) and a set ofn > 1 numbers {zi, . . . , Zn}-' 

i. (Vzi, ...,Zn) maxi £{zi) <£{- log e~^0 < log e^^^'-* < max^ e{zi) + log n 

ii. (Vzi,...,z„) n-logE^e-^O <E.^(^i) (23) 

iii. (3zi,...,z„) logEie^(^>) >E.^(^i) 

Proof. Given a set of numbers {ai, . . . , On}, the max function is bounded by the logsumexp 
function by maxj < log J2i 6*^' ^ maxj a j + log n [Boyd and Vandenberghe , 2006 . Equiv- 



alently, the min function is bounded by miuj aj — log n < — log e~"' < miuj Oj 

These identities allow us to prove two inequalities in Claim i, i.e. maxj i{zi) = ^(miuj Zi) < 
^ (— log e~^') and log^^e^^^'^ < maxj^(2;i) +logn. To prove the remaining inequality 
£ (— log e~^') < logY^^e^^^^\ note that for the logistic loss i{—log^^e~^') = log(l + 
e~^*) and log Y^- e^^^^^ = log(n + e~^*). Since n > 1, strict inequality holds. 
To prove Claim ii, we need to show that £ (— log J2i e^^') = log(l+^j e~^*) < J2i ^{^i) = 
^,log(l + This is equivalent to 1 + EiC"^' < Eli (1 + = Ece{o,i}" ^"""^^ = 

1 + Eje"^' + Ece{o,i}",iTe>i e"'' ^. Finally, we have Ece{o,i}",iTc>i e""" ^ > because 
the exponential function is strictly positive. 

To prove Claim iii, it suffices to find set of numbers {zi, . . . , Zn} for which log Yi e^^^*-* = 
log(n + E^ e~^') > Yli^i^i) = Zli log(l + This is equivalent to n + e~^' > 

(1 + e~^'). By setting (Vi) Zi = logn, we reduce the claim we want to prove to n + 1 > 
(1 + -)". Strict inequality holds for n > 1. Furthermore, note that lim„_j.-|_oo (1 + -)" = 
e. " □ 
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